{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <font color='green'>Movielens - Singular Value Decomposition (Sparse Matrix)<font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 1. Description<font>\n",
    "\n",
    "Recommendation using singular value decomposition.\n",
    "\n",
    "The data is taken from: https://grouplens.org/datasets/movielens/25m/\n",
    "It contains 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users. \n",
    "\n",
    "We used singular value decomposition (SVD) for the recommendation task. From the rating, we created user-item matrix that contains the ratings\n",
    "of the movie by the user. This matrix is quite sparse matrix. We decomposed the matrix into the multiplication of 3 matrices: left singular vectors (U), singular values (s), and right singular vectors (V(transposed)). We can limit the number of singular values (and the colums/rows of U and V) from the larger ones to approximate the matrix. After the approximation, we can infer the non-rated part of the original matrix by multiplying the U, s and V. This is well known collaborative filtering meth\n",
    "\n",
    "In the example result, Japanese animations are recommended to the user who likes such movies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 2. Data Preprocessing<font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Please uncomment the below lines to download and unzip the dataset.\n",
    "#!wget -N http://files.grouplens.org/datasets/movielens/ml-25m.zip\n",
    "#!unzip -o ml-25m.zip\n",
    "#!mv ml-25m datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape of the matrix is (162542, 209172)\n"
     ]
    }
   ],
   "source": [
    "# prepare data\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from scipy.sparse import coo_matrix\n",
    "from scipy.sparse import csr_matrix\n",
    "\n",
    "df = pd.read_csv(\"datasets/ml-25m/ratings.csv\").drop(['timestamp'], axis=1)\n",
    "# though userId and movieId starts from 1, do not -1 for consistency\n",
    "userId = df['userId'].values\n",
    "movieId = df['movieId'].values\n",
    "rating = df['rating'].values\n",
    "mat = coo_matrix((rating, (userId, movieId))).tocsr()\n",
    "print (\"shape of the matrix is {}\".format(mat.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 3. Implementation using Frovedis<font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "train time: 6.595 sec\n"
     ]
    }
   ],
   "source": [
    "# train\n",
    "import os, time\n",
    "from frovedis.decomposition import TruncatedSVD as frovTruncatedSVD\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 {}\".format(os.environ['FROVEDIS_SERVER']))\n",
    "\n",
    "svd = frovTruncatedSVD(n_components=300, algorithm='arpack')\n",
    "t1 = time.time()\n",
    "frov_transformed = svd.fit_transform(mat)\n",
    "t2 = time.time()\n",
    "print (\"train time: {:.3f} sec\".format(t2-t1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "highly rated movies of user 282:\n",
      "                                             title  rate\n",
      "                                      Akira (1988)   5.0\n",
      "                            Iron Giant, The (1999)   5.0\n",
      "                             Boxtrolls, The (2014)   5.0\n",
      "                  Grand Budapest Hotel, The (2014)   5.0\n",
      " From Up on Poppy Hill (Kokuriko-zaka kara) (2011)   5.0\n",
      "                                       Hugo (2011)   5.0\n",
      " Secret World of Arrietty, The (Kari-gurashi no...   5.0\n",
      "                                   Coraline (2009)   5.0\n",
      "                Ponyo (Gake no ue no Ponyo) (2008)   5.0\n",
      "                                     WALL·E (2008)   5.0\n",
      " Girl Who Leapt Through Time, The (Toki o kaker...   5.0\n",
      "          Tekkonkinkreet (Tekkon kinkurîto) (2006)   5.0\n",
      "                         Paprika (Papurika) (2006)   5.0\n",
      "                                 MirrorMask (2005)   5.0\n",
      " Howl's Moving Castle (Hauru no ugoku shiro) (2...   5.0\n",
      " Porco Rosso (Crimson Pig) (Kurenai no buta) (1...   5.0\n",
      " Kiki's Delivery Service (Majo no takkyûbin) (1...   5.0\n",
      "                           Tokyo Godfathers (2003)   5.0\n",
      " Lupin III: The Castle Of Cagliostro (Rupan san...   5.0\n",
      "                            Patema Inverted (2013)   5.0\n",
      "\n",
      "recommended movies:\n",
      "                                             title      rate\n",
      "    Grave of the Fireflies (Hotaru no haka) (1988)  3.563662\n",
      "        Ghost in the Shell (Kôkaku kidôtai) (1995)  2.820755\n",
      "             Wind Rises, The (Kaze tachinu) (2013)  1.446478\n",
      "                          Fantastic Mr. Fox (2009)  1.230561\n",
      "                               Perfect Blue (1997)  1.223616\n",
      "                           Moonrise Kingdom (2012)  1.122188\n",
      "                                         Up (2009)  1.109598\n",
      " Ghost in the Shell 2: Innocence (a.k.a. Innoce...  0.999837\n",
      "           Millennium Actress (Sennen joyû) (2001)  0.883533\n",
      "              Ninja Scroll (Jûbei ninpûchô) (1995)  0.850211\n",
      " Wolf Children (Okami kodomo no ame to yuki) (2...  0.846930\n",
      "                               Corpse Bride (2005)  0.817524\n",
      "                             Animatrix, The (2003)  0.811292\n",
      " Triplets of Belleville, The (Les triplettes de...  0.772407\n",
      "                                 Your Name. (2016)  0.771261\n",
      "                                 Persepolis (2007)  0.765392\n",
      "                    Summer Wars (Samâ wôzu) (2009)  0.764066\n",
      "                            Song of the Sea (2014)  0.739205\n",
      " Neon Genesis Evangelion: The End of Evangelion...  0.739014\n",
      "                                   Iron Man (2008)  0.693377\n"
     ]
    }
   ],
   "source": [
    "# predict\n",
    "from distutils.version import LooseVersion\n",
    "movies = pd.read_csv(\"datasets/ml-25m/movies.csv\")\n",
    "def print_movies(print_df):\n",
    "    merged = pd.merge(movies, print_df, on = 'movieId').sort_values('rate',ascending=False)\n",
    "    to_print = merged[['title','rate']]\n",
    "    if(LooseVersion(pd.__version__) >= LooseVersion(\"1.0.0\")):\n",
    "        print (to_print.to_string(index=False, max_colwidth=50))\n",
    "    else:\n",
    "        print (to_print.to_string(index=False))\n",
    "\n",
    "testId = 282\n",
    "\n",
    "# X ~= USV^T, transformed = US, svd_components_ = V^T\n",
    "infer = np.matmul(frov_transformed[testId,:], svd.components_)\n",
    "movieId = np.arange(infer.shape[0])\n",
    "infer_df = pd.DataFrame({'rate_infer' : infer, 'movieId' : movieId})\n",
    "\n",
    "rated = mat[testId,:]\n",
    "rated_data = rated.data\n",
    "rated_indices = rated.indices\n",
    "rated_df = pd.DataFrame({'rate_rated' : rated_data, 'movieId': rated_indices})\n",
    "rated_sorted = rated_df.sort_values('rate_rated',ascending=False)\n",
    "to_print = rated_sorted.rename(columns={'rate_rated': 'rate'})\n",
    "print(\"highly rated movies of user {}:\".format(testId))\n",
    "print_movies(to_print.head(20))\n",
    "\n",
    "# exclude already rated movies\n",
    "tmp_df = pd.merge(infer_df, rated_df, on='movieId', how='outer')\n",
    "not_rated_df = tmp_df[pd.isnull(tmp_df['rate_rated'])]\n",
    "not_rated_sorted = not_rated_df.sort_values('rate_infer',ascending=False)\n",
    "to_print = not_rated_sorted.rename(columns={'rate_infer': 'rate'})\n",
    "print(\"\")\n",
    "print(\"recommended movies:\")\n",
    "print_movies(to_print.head(20))\n",
    "\n",
    "FrovedisServer.shut_down()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 4. Implementation using scikit-learn<font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "train time: 104.333 sec\n"
     ]
    }
   ],
   "source": [
    "# train\n",
    "from sklearn.decomposition import TruncatedSVD as skTruncatedSVD\n",
    "\n",
    "svd = skTruncatedSVD(n_components=300, algorithm='arpack')\n",
    "t1 = time.time()\n",
    "sk_transformed = svd.fit_transform(mat)\n",
    "t2 = time.time()\n",
    "print (\"train time: {:.3f} sec\".format(t2-t1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "highly rated movies of user 282:\n",
      "                                             title  rate\n",
      "                                      Akira (1988)   5.0\n",
      "                            Iron Giant, The (1999)   5.0\n",
      "                             Boxtrolls, The (2014)   5.0\n",
      "                  Grand Budapest Hotel, The (2014)   5.0\n",
      " From Up on Poppy Hill (Kokuriko-zaka kara) (2011)   5.0\n",
      "                                       Hugo (2011)   5.0\n",
      " Secret World of Arrietty, The (Kari-gurashi no...   5.0\n",
      "                                   Coraline (2009)   5.0\n",
      "                Ponyo (Gake no ue no Ponyo) (2008)   5.0\n",
      "                                     WALL·E (2008)   5.0\n",
      " Girl Who Leapt Through Time, The (Toki o kaker...   5.0\n",
      "          Tekkonkinkreet (Tekkon kinkurîto) (2006)   5.0\n",
      "                         Paprika (Papurika) (2006)   5.0\n",
      "                                 MirrorMask (2005)   5.0\n",
      " Howl's Moving Castle (Hauru no ugoku shiro) (2...   5.0\n",
      " Porco Rosso (Crimson Pig) (Kurenai no buta) (1...   5.0\n",
      " Kiki's Delivery Service (Majo no takkyûbin) (1...   5.0\n",
      "                           Tokyo Godfathers (2003)   5.0\n",
      " Lupin III: The Castle Of Cagliostro (Rupan san...   5.0\n",
      "                            Patema Inverted (2013)   5.0\n",
      "\n",
      "recommended movies:\n",
      "                                             title      rate\n",
      "    Grave of the Fireflies (Hotaru no haka) (1988)  3.563662\n",
      "        Ghost in the Shell (Kôkaku kidôtai) (1995)  2.820755\n",
      "             Wind Rises, The (Kaze tachinu) (2013)  1.446478\n",
      "                          Fantastic Mr. Fox (2009)  1.230561\n",
      "                               Perfect Blue (1997)  1.223616\n",
      "                           Moonrise Kingdom (2012)  1.122188\n",
      "                                         Up (2009)  1.109598\n",
      " Ghost in the Shell 2: Innocence (a.k.a. Innoce...  0.999837\n",
      "           Millennium Actress (Sennen joyû) (2001)  0.883533\n",
      "              Ninja Scroll (Jûbei ninpûchô) (1995)  0.850211\n",
      " Wolf Children (Okami kodomo no ame to yuki) (2...  0.846930\n",
      "                               Corpse Bride (2005)  0.817524\n",
      "                             Animatrix, The (2003)  0.811292\n",
      " Triplets of Belleville, The (Les triplettes de...  0.772407\n",
      "                                 Your Name. (2016)  0.771261\n",
      "                                 Persepolis (2007)  0.765392\n",
      "                    Summer Wars (Samâ wôzu) (2009)  0.764066\n",
      "                            Song of the Sea (2014)  0.739205\n",
      " Neon Genesis Evangelion: The End of Evangelion...  0.739014\n",
      "                                   Iron Man (2008)  0.693377\n"
     ]
    }
   ],
   "source": [
    "# predict\n",
    "\n",
    "# X ~= USV^T, transformed = US, svd_components_ = V^T\n",
    "infer = np.matmul(sk_transformed[testId,:], svd.components_)\n",
    "movieId = np.arange(infer.shape[0])\n",
    "infer_df = pd.DataFrame({'rate_infer' : infer, 'movieId' : movieId})\n",
    "\n",
    "rated = mat[testId,:]\n",
    "rated_data = rated.data\n",
    "rated_indices = rated.indices\n",
    "rated_df = pd.DataFrame({'rate_rated' : rated_data, 'movieId': rated_indices})\n",
    "rated_sorted = rated_df.sort_values('rate_rated',ascending=False)\n",
    "to_print = rated_sorted.rename(columns={'rate_rated': 'rate'})\n",
    "print(\"highly rated movies of user {}:\".format(testId))\n",
    "print_movies(to_print.head(20))\n",
    "\n",
    "# exclude already rated movies\n",
    "tmp_df = pd.merge(infer_df, rated_df, on='movieId', how='outer')\n",
    "not_rated_df = tmp_df[pd.isnull(tmp_df['rate_rated'])]\n",
    "not_rated_sorted = not_rated_df.sort_values('rate_infer',ascending=False)\n",
    "to_print = not_rated_sorted.rename(columns={'rate_infer': 'rate'})\n",
    "print(\"\")\n",
    "print(\"recommended movies:\")\n",
    "print_movies(to_print.head(20))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
